AITopics | ak 1

Collaborating Authors

ak 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

Zhang, Chenyang, Cao, Yuan

arXiv.org Machine LearningMay-8-2026

One widely recognized interpretation for their empirical success is their ability to perform in-context learning (ICL): pretrained transformers are capable of performing previously unseen tasks based on demonstrations and examples in the prompt, without requiring any additional task-specific fine-tuning (Brown et al., 2020). A line of recent works interpret the in-context learning (ICL) capability of transformers from an algorithmic perspective, viewing transformers as models that can implicitly execute certain learning algorithms on the context examples. Specifically, Garg et al. (2022) proposes a theoretical framework for ICL in terms of learning a hypothesis class, and empirically shows that transformers can in-context learn the linear function class. Motivated by this empirical finding, several recent works attempt to theoretically study how transformers perform in-context learning on linear regression tasks. Aky urek et al. (2022); Von Oswald et al. (2023) construct multi-layer transformers with linear attention that can execute gradient descent on the an "in-context loss" defined on the context data, thereby enabling in-context learning of linear regression.

ak 1, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2605.06609

Genre:

Research Report > New Finding (0.64)
Research Report > Experimental Study (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

8d2a5f7d4afa5d0530789d3066945330-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 20:14:04 GMT

A.4 ResultsonCIFAR-10andCIFAR-100 In this section, we report results on CIFAR with different sizes of ResNets: ResNet-20 (RN20), ResNet-32(RN32),ResNet-44(RN44),ResNet-56(RN56),ResNet-110(RN110). We report results on CIFAR-10 in Table 15, and results on CIFAR-100 in Table 16.

andeq, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

On Minibatch Noise: Discrete-Time SGD, Overparametrization, and Bayes

Ziyin, Liu, Liu, Kangqiao, Mori, Takashi, Ueda, Masahito

arXiv.org Machine LearningFeb-10-2021

The noise in stochastic gradient descent (SGD), caused by minibatch sampling, remains poorly understood despite its enormous practical importance in offering good training efficiency and generalization ability. In this work, we study the minibatch noise in SGD. Motivated by the observation that minibatch sampling does not always cause a fluctuation, we set out to find the conditions that cause minibatch noise to emerge. We first derive the analytically solvable results for linear regression under various settings, which are compared to the commonly used approximations that are used to understand SGD noise. We show that some degree of mismatch between model and data complexity is needed in order for SGD to "cause" a noise, and that such mismatch may be due to the existence of static noise in the labels, in the input, the use of regularization, or underparametrization. Our results motivate a more accurate general formulation to describe minibatch noise.

fluctuation, noise, regularization, (15 more...)

arXiv.org Machine Learning

2102.05375

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback